Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add autograd profiler #1693

Closed
wants to merge 10 commits into from
Closed

Conversation

quinor
Copy link
Contributor

@quinor quinor commented May 1, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Adds a profiler based on torch.autograd.profile.profiler. It's different from the existing profilers as it allows to profile impact of specific pytorch ops (including C/C++ and CUDA code) compared to just the python code.

Closes #3917.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
@jeremyjordan

Did you have fun?

Make sure you had fun coding 🙃

@pep8speaks
Copy link

pep8speaks commented May 1, 2020

Hello @quinor! Thanks for updating this PR.

Line 75:121: E501 line too long (132 > 120 characters)
Line 76:121: E501 line too long (133 > 120 characters)
Line 77:121: E501 line too long (132 > 120 characters)
Line 97:121: E501 line too long (132 > 120 characters)

Comment last updated at 2020-12-02 23:27:31 UTC

@mergify mergify bot requested a review from a team May 1, 2020 13:31
@quinor quinor changed the title [WIP] added autograd profiler [WIP] [help needed] added autograd profiler May 1, 2020
@Borda Borda added feature Is an improvement or enhancement help wanted Open to be worked on labels May 5, 2020
@Borda Borda changed the title [WIP] [help needed] added autograd profiler [WIP] added autograd profiler May 5, 2020
@Borda
Copy link
Member

Borda commented May 5, 2020

@quinor what do need help with? @jeremyjordan may you have look?

@quinor
Copy link
Contributor Author

quinor commented May 5, 2020

Tests, docs and thread safety + (of course) a review. Please see the list at the end of the top message.

@jeremyjordan
Copy link
Contributor

@Borda would you mind looking into the thread safety?

@mergify
Copy link
Contributor

mergify bot commented May 10, 2020

This pull request is now in conflict... :(

@Borda
Copy link
Member

Borda commented May 26, 2020

Borda would you mind looking into the thread-safety?

what do you mean by thread-safety, in python unless it is spec by no-GIL locker it is always safe...

@quinor
Copy link
Contributor Author

quinor commented May 26, 2020

Borda would you mind looking into the thread-safety?

what do you mean by thread-safety, in python unless it is spec by no-GIL locker it is always safe...

Quoted from my initial PR message:

"thread-safety (the profiler may fail if it's ran in multiple threads, ie. when there's more than one dataloader worker the profiler can be entered with the same action_name multiple times at once and that doesn't end well)"

Basically if someone does:

[thread1] profiler.start("name")
[thread2] profiler.start("name")
[thread1] profiler.stop("name")
[thread2] profiler.stop("name")

my profiler does break. There are 3 options I see right now:

  1. fix this profiler so that it's OK
  2. fix the way profilers are called in PL so that it doesn't happen
  3. (I think might be the best) add some kind of UID to the profiler.{start, stop} calls so that each start-stop pair is unique. Alternatively make start return an object "stop" is called upon, that would require a bit of code rework though.

@edenlightning
Copy link
Contributor

Hey @quinor, do you still need help with this? Mind taking a look?

@quinor
Copy link
Contributor Author

quinor commented Jun 10, 2020

@edenlightning please see my last message. I'm not deep enough into lightning to know how to fix the issues I had pinpointed and I don't think it can be merged without fixing them, so help with that would really be appreciated. I thought @Borda was going to look into that.

@awaelchli awaelchli added this to the 0.9.x milestone Aug 8, 2020
@edenlightning edenlightning assigned Borda and unassigned jeremyjordan Aug 10, 2020
@edenlightning
Copy link
Contributor

@Borda keep this PR?

@sachit-menon
Copy link

Just came across this linked in the Slack - this would be very cool functionality if it could end up merged.

@quinor
Copy link
Contributor Author

quinor commented Sep 7, 2020

Just came across this linked in the Slack - this would be very cool functionality if it could end up merged.

feel free to work on it, I'm not deep enough in lightning to fix the issues I pinpointed :/

@sachit-menon
Copy link

I'm not either unfortunately, but the issue with the profiler and num_workers > 0 seems like it may have deeper roots: pytorch/pytorch#6313

@Borda
Copy link
Member

Borda commented Sep 24, 2020

I 'll have a look just when we fix master so we can easily debug the optional issue here...

@codecov
Copy link

codecov bot commented Sep 24, 2020

Codecov Report

Merging #1693 (ddf8fb8) into master (7b7d4bb) will decrease coverage by 0%.
The diff coverage is 96%.

@@          Coverage Diff           @@
##           master   #1693   +/-   ##
======================================
- Coverage      93%     93%   -0%     
======================================
  Files         124     124           
  Lines        9303    9355   +52     
======================================
+ Hits         8616    8662   +46     
- Misses        687     693    +6     

@Borda Borda added this to the 1.1.x milestone Nov 30, 2020
@Borda
Copy link
Member

Borda commented Nov 30, 2020

@quinor how is it going, do you this we gen this done by end of this week to pass to 1.1? 🐰

@quinor
Copy link
Contributor Author

quinor commented Nov 30, 2020

@Borda I'll try to get the code ready on wednesday + hopefully some docs. I hope we can then close it by the end of the week.

@quinor
Copy link
Contributor Author

quinor commented Dec 2, 2020

I see some tests failing for pytorch 1.3 (since old pytorch's autograd.profile does not support profile_memory) - what should we do with that? It's supported in pytorch since PR pytorch/pytorch#37775, that is pytorch 1.6 I think...

@quinor quinor changed the title [WIP] added autograd profiler add autograd profiler Dec 2, 2020
@quinor
Copy link
Contributor Author

quinor commented Dec 4, 2020

@Borda it's ready from my side, some issues (tests failing due to old pytorch) still pending that I couldn't resolve by myself. Do you still want to push it into 1.1?

@tchaton
Copy link
Contributor

tchaton commented Jan 19, 2021

Hey @quinor,

I have a similar PR there: #5560. Let's merge them together.
Please, join Slack so we can coordinate there :)

Best,
T.C

@quinor
Copy link
Contributor Author

quinor commented Jan 22, 2021

closing, #5560 implements this profiler.

@quinor quinor closed this Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement help wanted Open to be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support PyTorch Profiler
9 participants